{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "909fc373",
   "metadata": {},
   "source": [
    "# Practice of Minor Language Text Recognition R&D Based on ERNIE 4.5 and PaddleOCR\n",
    "\n",
    "## 1. Background Introduction\n",
    "\n",
    "Since its release, PaddleOCR has received widespread attention due to its outstanding text recognition capabilities and end-to-end development abilities. During the full-process development, many users often need to obtain a large amount of annotated text line data. However, the high cost of data annotation often makes it difficult to meet this demand. Traditional data annotation processes rely heavily on manual labor, which is not only time-consuming and labor-intensive but also prone to subjective bias, resulting in inconsistent label accuracy. Especially in practical applications involving diverse scenarios and complex semantics, the difficulty of data acquisition and annotation increases further. This tutorial aims to address this issue by utilizing ERNIE 4.5 to achieve automatic annotation of text lines, thereby effectively improving the recognition performance of text recognition models in real-world scenarios.\n",
    "\n",
    "The automatic text recognition data annotation process based on ERNIE 4.5 is as follows: First, images containing text are collected. The PP-OCRv5 detection model of PaddleOCR is used to detect and locate the text lines in these images, and each line of text is cropped into an individual text line image. Then, ERNIE 4.5 is used to independently predict these images twice. Images with consistent results in both predictions are selected, and the corresponding recognition result is taken as the final ground truth label. This filtering mechanism can effectively avoid hallucination issues that may occur with large models, ensuring the accuracy and high quality of the automatically annotated data, and providing reliable data support for subsequent text recognition model training.\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/PP-OCRv5/cookbook/ocr_rec_data_labeled.png\" width=\"800\"/>\n",
    "</div>\n",
    "\n",
    "This tutorial uses a **Russian character recognition** dataset as an example to demonstrate how to achieve automatic data annotation based on ERNIE 4.5. The original Russian data used in this tutorial was collected from the internet. Users can use this [dataset](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/russian_dataset_demo.tar) for batch automatic annotation operations, quickly completing the high-quality annotation and training process."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03afa3ab",
   "metadata": {},
   "source": [
    "## 2. Environment Setup\n",
    "\n",
    "This project depends on PaddlePaddle, PaddleOCR, the OpenAI SDK, and common Python utility packages. Please ensure all required dependencies are installed before use. For detailed installation instructions, refer to the [Environment Setup Documentation](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/installation.md)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e46ef631",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install openai matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40c45e31",
   "metadata": {},
   "source": [
    "## 3. Text Line Detection and Cropping\n",
    "\n",
    "Text detection is the first step in the OCR process. In this tutorial, the detection model PP-OCRv5_server_det is used to automatically locate each line of text in an image. Once located, the corresponding regions are cropped into individual text line images, which facilitates subsequent label prediction using ERNIE 4.5 and the training of text recognition models. This approach helps improve overall recognition accuracy and efficiency."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "504d80ce",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Obtain the Russian sample dataset\n",
    "!wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/russian_dataset_demo.tar\n",
    "!tar -xf russian_dataset_demo.tar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69d60f7f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "import copy\n",
    "import glob\n",
    "import os\n",
    "import time\n",
    "\n",
    "import cv2\n",
    "import numpy as np\n",
    "from openai import OpenAI\n",
    "from tqdm import tqdm\n",
    "\n",
    "\n",
    "def get_rotate_crop_image(img: np.ndarray, points: list) -> np.ndarray:\n",
    "    \"\"\"\n",
    "    Crop and rotate the image region to obtain a small text line image after perspective transformation.\n",
    "    \"\"\"\n",
    "    assert len(points) == 4, \"shape of points must be 4*2\"\n",
    "    img_crop_width = int(\n",
    "        max(\n",
    "            np.linalg.norm(points[0] - points[1]),\n",
    "            np.linalg.norm(points[2] - points[3]),\n",
    "        )\n",
    "    )\n",
    "    img_crop_height = int(\n",
    "        max(\n",
    "            np.linalg.norm(points[0] - points[3]),\n",
    "            np.linalg.norm(points[1] - points[2]),\n",
    "        )\n",
    "    )\n",
    "    pts_std = np.float32(\n",
    "        [\n",
    "            [0, 0],\n",
    "            [img_crop_width, 0],\n",
    "            [img_crop_width, img_crop_height],\n",
    "            [0, img_crop_height],\n",
    "        ]\n",
    "    )\n",
    "    M = cv2.getPerspectiveTransform(points, pts_std)\n",
    "    dst_img = cv2.warpPerspective(\n",
    "        img,\n",
    "        M,\n",
    "        (img_crop_width, img_crop_height),\n",
    "        borderMode=cv2.BORDER_REPLICATE,\n",
    "        flags=cv2.INTER_CUBIC,\n",
    "    )\n",
    "    dst_img_height, dst_img_width = dst_img.shape[0:2]\n",
    "    if dst_img_height * 1.0 / dst_img_width >= 1.5:\n",
    "        dst_img = np.rot90(dst_img)\n",
    "    return dst_img\n",
    "\n",
    "\n",
    "def get_minarea_rect_crop(img: np.ndarray, points: np.ndarray) -> np.ndarray:\n",
    "    \"\"\"\n",
    "    Crop the minimum-area rectangular region from the detected set of points.\n",
    "    \"\"\"\n",
    "    bounding_box = cv2.minAreaRect(np.array(points).astype(np.int32))\n",
    "    points = sorted(cv2.boxPoints(bounding_box), key=lambda x: x[0])\n",
    "    index_a, index_b, index_c, index_d = 0, 1, 2, 3\n",
    "    if points[1][1] > points[0][1]:\n",
    "        index_a = 0\n",
    "        index_d = 1\n",
    "    else:\n",
    "        index_a = 1\n",
    "        index_d = 0\n",
    "    if points[3][1] > points[2][1]:\n",
    "        index_b = 2\n",
    "        index_c = 3\n",
    "    else:\n",
    "        index_b = 3\n",
    "        index_c = 2\n",
    "\n",
    "    box = [points[index_a], points[index_b], points[index_c], points[index_d]]\n",
    "    crop_img = get_rotate_crop_image(img, np.array(box))\n",
    "    return crop_img\n",
    "\n",
    "\n",
    "def crop_and_save(image_path, output_dir, ocr):\n",
    "    \"\"\"\n",
    "    Detect and crop all text lines in the image, and save them to output_dir.\n",
    "    \"\"\"\n",
    "    img = cv2.imread(image_path)\n",
    "    img_name = os.path.splitext(os.path.basename(image_path))[0]\n",
    "    result = ocr.predict(image_path)\n",
    "    try:\n",
    "        for res in result:\n",
    "            cnt = 0\n",
    "            for quad_box in res['dt_polys']:\n",
    "                img_crop = get_minarea_rect_crop(res['input_img'], copy.deepcopy(quad_box))\n",
    "                cv2.imwrite(os.path.join(output_dir, f\"{img_name}_crop{cnt:04d}.jpg\"), img_crop)\n",
    "                cnt += 1\n",
    "\n",
    "    except Exception as e:\n",
    "        print(f\"Process Failed with error: {e}\")\n",
    "\n",
    "\n",
    "# Usage example (assuming all your images are in the russian_dataset_demo/ directory)\n",
    "input_dir = 'russian_dataset_demo'\n",
    "output_dir = 'crops'  # The cropped images will be saved to this directory.\n",
    "os.makedirs(output_dir, exist_ok=True)\n",
    "\n",
    "image_paths = glob.glob(os.path.join(input_dir, '*.jpg')) + glob.glob(os.path.join(input_dir, '*.png'))\n",
    "\n",
    "# Batch processing\n",
    "from paddleocr import TextDetection\n",
    "\n",
    "ocr = TextDetection(\n",
    "    model_name=\"PP-OCRv5_server_det\",\n",
    "    device='gpu',\n",
    ")\n",
    "for path in tqdm(image_paths):\n",
    "    crop_and_save(path, output_dir, ocr)\n",
    "print(f\"Cropping completed, saved to the {output_dir} directory\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7629d782",
   "metadata": {},
   "source": [
    "### 3.2 Visualization of Cropping Results\n",
    "\n",
    "After cropping, it is recommended to randomly sample some of the small images to verify the detection and cropping quality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7414786",
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "crop_imgs = glob.glob(os.path.join(output_dir, '*.jpg'))\n",
    "\n",
    "if len(crop_imgs) >= 5:\n",
    "    show_imgs = random.sample(crop_imgs, 5)\n",
    "else:\n",
    "    show_imgs = crop_imgs  # Display all if there are fewer than 5 images\n",
    "\n",
    "for crop_path in show_imgs:\n",
    "    img = cv2.imread(crop_path)\n",
    "    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
    "    plt.imshow(img)\n",
    "    plt.title(os.path.basename(crop_path))\n",
    "    plt.axis('off')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bdf36da",
   "metadata": {},
   "source": [
    "## 4. Using ERNIE 4.5 to Predict on Cropped Images\n",
    "\n",
    "By directly utilizing ERNIE 4.5, the efficiency and accuracy of automatic annotation for unstructured text images can be greatly improved:\n",
    "\n",
    "- No need for manual verification of each image; the large model directly outputs the text content.\n",
    "- Multi-round consistency checks effectively reduce the risks of hallucinations or misreading by the model.\n",
    "- Supports complex scenarios, such as handwritten, printed, cursive, and blurry text samples."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c18a40b",
   "metadata": {},
   "source": [
    "### 4.1 Deploying ERNIE 4.5 and Setting Key Parameters\n",
    "\n",
    "In this example, the ERNIE large model is invoked through service requests, so it needs to be deployed as a local service. The deployment can be accomplished using the FastDeploy tool, which is an inference deployment tool for large models open-sourced by PaddlePaddle. For deployment methods, please refer to the [FastDeploy official documentation](https://github.com/PaddlePaddle/FastDeploy).\n",
    "\n",
    "After deploying FastDeploy as a backend service, you need to fill in the service URL in the configuration below, and use a script to test the service. If the output includes “Test successful!”, it indicates the service deployment is available; otherwise, it means the service is unavailable. Please troubleshoot based on the error message."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66362805",
   "metadata": {},
   "outputs": [],
   "source": [
    "base_url = \"\"  # # Please fill in the URL of the local service, e.g., http://0.0.0.0:8000/v1\n",
    "model_name = \"xxx\"  # Select the model to invoke\n",
    "prompt = \"Identify the text content in the image and output it as plain text. If there is no text in the image, output ###. Do not explain, do not add line breaks, do not translate, and do not output extra content.\"  # Can be modified according to the actual situation\n",
    "api_key = \"api_key\"  # No modification is needed for local deployment\n",
    "\n",
    "try:\n",
    "    import openai\n",
    "\n",
    "    client = openai.OpenAI(base_url=base_url, api_key=api_key)\n",
    "    question = \"Who are you?\"\n",
    "    response1 = client.chat.completions.create(model=model_name, messages=[{\"role\": \"user\", \"content\": question}])\n",
    "    reply = response1.choices[0].message.content\n",
    "except Exception as e:  # Corrected from \"Exception()\" to \"Exception\"\n",
    "    print(f\"Test failed! The error message is:\\n{e}\")\n",
    "\n",
    "print(f\"Test successful!\\nQuestion: {question}\\nAnswer: {reply}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb816f86",
   "metadata": {},
   "source": [
    "### 4.2 Main Process of Automatic Label Generation Based on ERNIE 4.5\n",
    "\n",
    "This is an automated process for batch labeling of text line images cropped by the detection model: it automatically scans all images in a specified folder, calls ERNIE 4.5 to recognize the text content in each image, and saves the labeling results to an output file. The code also features breakpoint recovery, automatically skipping images that have already been processed, so that even if the process is interrupted, it can resume from where it left off and continue processing unfinished images. To ensure label accuracy, each image is inferred twice with two different prompts, and only if the results are consistent will the label be accepted as final. The process also supports automatic retry in case of exceptions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "057cd817",
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob\n",
    "import os\n",
    "\n",
    "from tqdm import tqdm\n",
    "\n",
    "\n",
    "def encode_image(image_path):\n",
    "    \"\"\"Convert image to base64 string, compatible with multimodal API.\"\"\"\n",
    "    with open(image_path, \"rb\") as image_file:\n",
    "        return base64.b64encode(image_file.read()).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "def initialize_client(api_key, base_url):\n",
    "    \"\"\"Initialize OpenAI compatible API client.\"\"\"\n",
    "    return OpenAI(api_key=api_key, base_url=base_url)\n",
    "\n",
    "\n",
    "def read_processed_images(output_file):\n",
    "    \"\"\"Read processed image paths to avoid duplicate labeling.\"\"\"\n",
    "    processed = set()\n",
    "    if os.path.exists(output_file):\n",
    "        with open(output_file, \"r\", encoding=\"utf-8\") as f:\n",
    "            for line in f:\n",
    "                if line.strip():\n",
    "                    image_path = line.split('\\t')[0]\n",
    "                    processed.add(image_path)\n",
    "    return processed\n",
    "\n",
    "\n",
    "def append_result(output_file, line):\n",
    "    \"\"\"Append label result to output file.\"\"\"\n",
    "    with open(output_file, \"a\", encoding=\"utf-8\") as f:\n",
    "        f.write(f\"{line}\\n\")\n",
    "\n",
    "\n",
    "# Suppose all your images are in the './crops' folder\n",
    "image_folder = './crops'  # Change to your image folder path\n",
    "image_list_file = 'image_list.txt'  # Store all image paths\n",
    "\n",
    "image_paths = glob.glob(os.path.join(image_folder, \"*.jpg\"))\n",
    "\n",
    "with open(image_list_file, \"w\", encoding=\"utf-8\") as f:\n",
    "    f.writelines(f\"{img_path}\\n\" for img_path in image_paths)\n",
    "\n",
    "print(f\"Collected {len(image_paths)} image paths and wrote to {image_list_file}\")\n",
    "\n",
    "output_file = \"label_output.txt\"\n",
    "\n",
    "max_retries = 3  # Maximum number of retries after failure\n",
    "\n",
    "LIMIT_PROMPT_SUFFIX = \"Please strictly output only the text content in the image, do not output any explanation or other content. Do not write in formula encoding format either.\"  # Restriction, used for the second prompt\n",
    "\n",
    "with open(image_list_file, \"r\", encoding=\"utf-8\") as f:\n",
    "    all_images = [line.strip() for line in f if line.strip()]\n",
    "client = initialize_client(api_key, base_url)\n",
    "processed_images = read_processed_images(output_file)\n",
    "remaining_images = [img for img in all_images if img not in processed_images]\n",
    "\n",
    "if not remaining_images:\n",
    "    print(\"All images have been processed.\")\n",
    "else:\n",
    "    with tqdm(total=len(remaining_images), desc=\"Batch Image Labeling\", unit=\"image\") as pbar:\n",
    "        for idx, image_path in enumerate(remaining_images, 1):\n",
    "            retries = 0\n",
    "            while retries < max_retries:\n",
    "                try:\n",
    "                    base64_image = encode_image(image_path)\n",
    "                    # First inference\n",
    "                    response1 = client.chat.completions.create(\n",
    "                        model=model_name,\n",
    "                        messages=[\n",
    "                            {\n",
    "                                \"role\": \"user\",\n",
    "                                \"content\": [\n",
    "                                    {\"type\": \"text\", \"text\": prompt},\n",
    "                                    {\n",
    "                                        \"type\": \"image_url\",\n",
    "                                        \"image_url\": {\"url\": f\"data:image/jpeg;base64,{base64_image}\"},\n",
    "                                    },\n",
    "                                ],\n",
    "                            }\n",
    "                        ],\n",
    "                        stream=False,\n",
    "                    )\n",
    "                    rec_text1 = response1.choices[0].message.content.strip()\n",
    "\n",
    "                    # Second inference\n",
    "                    response2 = client.chat.completions.create(\n",
    "                        model=model_name,\n",
    "                        messages=[\n",
    "                            {\n",
    "                                \"role\": \"user\",\n",
    "                                \"content\": [\n",
    "                                    {\"type\": \"text\", \"text\": prompt + LIMIT_PROMPT_SUFFIX},\n",
    "                                    {\n",
    "                                        \"type\": \"image_url\",\n",
    "                                        \"image_url\": {\"url\": f\"data:image/jpeg;base64,{base64_image}\"},\n",
    "                                    },\n",
    "                                ],\n",
    "                            }\n",
    "                        ],\n",
    "                        stream=False,\n",
    "                    )\n",
    "                    rec_text2 = response2.choices[0].message.content.strip()\n",
    "\n",
    "                    # Compare two results\n",
    "                    if rec_text1 == rec_text2 and rec_text1 != \"###\":\n",
    "                        result_line = f\"{image_path}\\t{rec_text1}\"\n",
    "                        append_result(output_file, result_line)\n",
    "                        print(\n",
    "                            f\"Successfully processed image: {image_path} ({idx}/{len(remaining_images)}), both results are consistent.\"\n",
    "                        )\n",
    "                        break  # Success, break retry loop\n",
    "                    else:\n",
    "                        print(\n",
    "                            f\"Image {image_path} two results are inconsistent or there is no text in the image, discarded. Result 1: {rec_text1}, Result 2: {rec_text2}\"\n",
    "                        )\n",
    "                        break  # No more retries, just skip\n",
    "                except Exception as e:\n",
    "                    retries += 1\n",
    "                    print(f\"Error processing image {image_path} (attempt {retries}/{max_retries}): {e}\")\n",
    "                    if retries < max_retries:\n",
    "                        sleep_time = 2**retries\n",
    "                        print(f\"Retrying after waiting {sleep_time} seconds.\")\n",
    "                        time.sleep(sleep_time)\n",
    "                    else:\n",
    "                        print(f\"Image {image_path} failed after reaching maximum retries.\")\n",
    "            pbar.set_postfix({\"Current image\": os.path.basename(image_path)})\n",
    "            pbar.update(1)\n",
    "\n",
    "print(\"All processing completed. Results saved to\", output_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a68999b9",
   "metadata": {},
   "source": [
    "### 4.3. View the final tag results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0d4513e",
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(output_file, \"r\", encoding=\"utf-8\") as f:\n",
    "    for i, line in enumerate(f):\n",
    "        print(line.strip())\n",
    "        if i > 9:  # Only display the first 10 lines, the rest are omitted\n",
    "            print('......')\n",
    "            break"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "315f1201",
   "metadata": {},
   "source": [
    "The output format is: image path\\trecognized text, which can be directly used for OCR training or manual verification. The results after running are shown as follows:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/PP-OCRv5/cookbook/labeled_show.png\" width=\"600\"/>\n",
    "</div>\n",
    "\n",
    "## 5. Training a Russian Text Recognition Model Based on Labeled Data\n",
    "\n",
    "We obtained a large amount of high-quality labeled data by acquiring text line images through the text detection model and automating data labeling with ERNIE 4.5, which can effectively support the training of the Russian text recognition model.\n",
    "\n",
    "### 5.1 Clone the PaddleOCR Repository and Initiate Training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ee9d699",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Clone the PaddleOCR repository\n",
    "!git clone https://github.com/PaddlePaddle/PaddleOCR.git\n",
    "\n",
    "# Install dependencies required for training\n",
    "%pip install -r PaddleOCR/requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0bb032aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Start Training\n",
    "# When passing in the dataset path, please ensure that the evaluation dataset has been prepared in advance. For demonstration purposes, the evaluation set and the training set are set to be the same dataset here.\n",
    "\n",
    "!python PaddleOCR/tools/train.py -c PaddleOCR/configs/rec/PP-OCRv5/multi_language/eslav_PP-OCRv5_mobile_rec.yml -o Train.dataset.data_dir=./  Train.dataset.label_file_list=./label_output.txt Eval.dataset.data_dir=./  Eval.dataset.label_file_list=./label_output.txt Global.epoch_num=20 Global.character_dict_path=PaddleOCR/ppocr/utils/dict/ppocrv5_eslav_dict.txt Global.pretrained_model=https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-OCRv5_mobile_rec_pretrained.pdparams Global.eval_batch_step=100 Train.loader.batch_size_per_card=32  Eval.loader.batch_size_per_card=32 Train.sampler.first_bs=32"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43dc028f",
   "metadata": {},
   "source": [
    "The model weights are saved in the directory `./output/eslav_rec_ppocr_v5`. For more information on how to initiate training, please refer to the [documentation](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version2.x/ppocr/model_train/recognition.md).\n",
    "\n",
    "### 5.2 Model Export\n",
    "\n",
    "During training, the saved models are checkpoints, which contain only the model parameters and are mainly used for tasks such as resuming training. The inference model (saved using `paddle.jit.save`) is primarily used for prediction and deployment scenarios. Compared with the checkpoints generated during training, the inference model additionally saves the model's structural information, making it superior in prediction deployment and accelerated inference. It is also flexible and convenient, suitable for integration into real-world systems.\n",
    "\n",
    "The method for converting a recognition model to an inference model is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "47dca202",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The Global.pretrained_model parameter sets the address of the training model to be converted.\n",
    "# The Global.save_inference_dir parameter sets the address where the converted model will be saved.\n",
    "\n",
    "!python3 PaddleOCR/tools/export_model.py -c PaddleOCR/configs/rec/PP-OCRv5/multi_language/eslav_PP-OCRv5_mobile_rec.yml -o Global.pretrained_model=./output/eslav_rec_ppocr_v5/best_accuracy Global.save_inference_dir=./inference/eslav_PP-OCRv5_mobile_rec_infer/ Global.character_dict_path=PaddleOCR/ppocr/utils/dict/ppocrv5_eslav_dict.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0ef29f8",
   "metadata": {},
   "source": [
    "After successful conversion, there are three files under the directory:\n",
    "\n",
    "```\n",
    "inference/eslav_PP-OCRv5_mobile_rec_infer/\n",
    "    ├── inference.pdiparams         # Parameter file for the recognition inference model\n",
    "    └── inference.json              # Program file for the recognition inference model\n",
    "    └── inference.yaml              # Configuration file for the recognition inference model\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00256eee",
   "metadata": {},
   "source": [
    "### 5.3 Model Prediction\n",
    "\n",
    "Use the exported static graph model to predict images of Russian text lines. You can download the [test image](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/demo_images/labeled_test.jpg) and use the following code for prediction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a11d9e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!paddleocr text_recognition -i https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/demo_images/labeled_test.jpg --model_name eslav_PP-OCRv5_mobile_rec --model_dir ./inference/eslav_PP-OCRv5_mobile_rec_infer/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17d51875",
   "metadata": {},
   "source": [
    "The prediction results are saved in the `./output` directory, and the visualization results are shown in the figure: \n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/PP-OCRv5/cookbook/labeled_test_res.jpg\" width=\"400\"/>\n",
    "</div>\n",
    "\n",
    "The exported static graph model can also be integrated into PP-OCRv5. It should be noted that the text detection model does not need to be trained separately for minor languages, as it already possesses strong text feature detection capabilities. You can directly use the PP-OCRv5_server_det model for text line detection and refer to the following code for prediction.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "989a79d1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/demo_images/ru_pipeline_test.jpg --text_detection_model_name PP-OCRv5_server_det --text_recognition_model_name eslav_PP-OCRv5_mobile_rec --text_recognition_model_dir inference/eslav_PP-OCRv5_mobile_rec_infer/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f671ea0",
   "metadata": {},
   "source": [
    "\n",
    "The prediction results are saved in the ./output directory. The visualization of the results is shown in the figure:\n",
    "\n",
    "<div align=\"center\">\n",
    "<img src=\"https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/PP-OCRv5/cookbook/ru_pipeline_result.jpg\" width=\"600\"/>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbefb10d",
   "metadata": {},
   "source": [
    "## 6. Summary\n",
    "\n",
    "This tutorial, based on Russian text recognition, systematically demonstrates the complete R&D workflow for low-resource language text recognition tasks using ERNIE 4.5 and PaddleOCR. The process includes text line detection and cropping, automatically generating high-quality labels for the cropped images with ERNIE 4.5, and finally training and inference of the text recognition model. By following the steps in this tutorial, you can not only quickly complete automatic data labeling for specific scenarios, but also efficiently train text recognition models tailored to your own needs—significantly reducing manual labeling costs and improving development efficiency.\n",
    "\n",
    "It should be noted that the number of sample images used in this tutorial is relatively small, so the accuracy of the resulting model may be limited. The main purpose of the tutorial is to help you become familiar with the complete workflow of image labeling and model training based on ERNIE 4.5. In practical applications, if you wish to achieve higher model accuracy and stronger generalization ability, it is recommended to use more and richer image data for labeling and training. This will allow you to fully leverage the advantages of large models and achieve better recognition results in real-world scenarios."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b5e0208",
   "metadata": {},
   "source": [
    "## 7. Frequently Asked Questions and Optimization Suggestions\n",
    "\n",
    "- What should I do if the image detection or cropping results are unsatisfactory?\n",
    "  - Check if the image resolution is too low. It is recommended to enlarge the image appropriately and try again.\n",
    "  - Try fine-tuning the parameters of the PP-OCRv5 model, such as `box_thresh`, `unclip_ratio`, etc. For specific parameter adjustment methods, please refer to the [documentation](https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/module_usage/text_detection.md).\n",
    "- What if an error occurs or the recognition results are abnormal when calling ERNIE?\n",
    "  - Check whether the service URL is filled in correctly and whether the port is open.\n",
    "  - Try modifying or optimizing the prompt content. Clearly specify \"output text only\" to prevent the model from outputting explanatory content.\n",
    "  - Check whether the image’s base64 encoding method and format are consistent with the API requirements.\n",
    "- Other suggestions and precautions\n",
    "  - Resume from breakpoint: The data labeling process supports resuming from breakpoints. After interruption, simply restart the script to automatically skip already processed images and avoid repeated labeling.\n",
    "  - Multi-process acceleration: When there are a large number of images, you can use multi-processing to improve labeling efficiency.\n",
    "  - Consistent label format with training configuration: Make sure the output label format (such as separating image path and text content with a `\\t`) is consistent with the PaddleOCR training script requirements."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
